Refactor LeanAI to production-ready MLOps pipeline with modular training#55
Open
Refactor LeanAI to production-ready MLOps pipeline with modular training#55
Conversation
- Replace placeholder stubs with real implementations (modeling/train.py, modeling/predict.py, dataset.py, plots.py, drift_detection.py, retrain_flow.py) - Fix API security: remove debug print, tighten CORS, add /health endpoint - Upgrade Dockerfiles from Python 3.9 to 3.12 with multi-service compose - Add pytest test suite (test_api.py, test_dataset.py, test_modeling.py) - Add GitHub Actions CI/CD (lint, test, docker build) - Consolidate dependencies in pyproject.toml with optional groups - Fix pixi.toml for cross-platform support (linux, macos, windows) - Improve Makefile with full dev workflow commands - Update README with architecture diagrams and quick start guide https://claude.ai/code/session_018HH3y8TvMgXG3PEmdEjoRS
- Fix encode_sex() to handle pandas 3.0 StringDtype (not just object) - Fix API: use modern Starlette TemplateResponse signature (request, name, context) - Clear broken __init__.py (referenced non-existent leanai module) - Improve test data for outlier test - Add ruff exclusions for legacy MLflow artifacts and pre-existing scripts - Apply ruff formatter across all new code All 18 tests pass, ruff check clean. https://claude.ai/code/session_018HH3y8TvMgXG3PEmdEjoRS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive refactoring of the LeanAI body fat prediction project from a monolithic structure to a production-ready MLOps pipeline. The codebase has been reorganized with modular components, proper testing infrastructure, and containerization support while maintaining the core SVR model achieving <1% error (MAE: 0.10, R²: 0.9996).
Key Changes
Core ML Pipeline
modeling/train.py): Extracted feature engineering (polynomial features, RFE, PCA) into reusable functions with explicit model building and evaluationmodeling/predict.py): Separated inference logic with support for single predictions and batch processingdataset.py): Implemented feature engineering pipeline (BMI, waist-to-hip ratios, arm ratios) with outlier removal using z-score normalizationAPI & Deployment
api/main.py): Simplified endpoint structure with proper CORS configuration, health checks, and environment-based model loadingmlops/src/fastapi_app/main.py): Added MLflow model registry integration alongside local model loadingTesting & Quality
tests/): Added pytest tests for API endpoints, dataset processing, and model training.github/workflows/ci.yml): GitHub Actions workflow for linting, testing, and Docker buildspyproject.tomlMLOps Infrastructure
mlops/src/monitoring/drift_detection.py): Evidently AI integration for data, target, and regression performance monitoringmlops/src/retraining/retrain_flow.py): Metaflow-based automated retraining triggered by drift detectionmlops/src/train_and_log_mlflow.py): Model tracking and experiment loggingDocumentation & Configuration
requirements.txtandpyproject.tomlwith optional dependency groups (mlops, dev, viz)pixi.tomlfor Linux, macOS, and Windows with focused dependency setVisualization
plots.py): Refactored plotting functions for distributions, correlation heatmaps, target relationships, and outlier detectionNotable Implementation Details
https://claude.ai/code/session_018HH3y8TvMgXG3PEmdEjoRS